Search CORE

52 research outputs found

Semantic Wikis: Conclusions from Real-World Projects With Ylvi

Author: Popitsch N
Ross r
Schandl B
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2008
Field of study

Interest-based RDF Update Propagation

Author: B Schandl
G Tummarello
K Voruganti
L Pellegrino
N Popitsch
P-A Chirita
R Verborgh
S Tramp
Publication venue
Publication date: 01/01/2015
Field of study

Many LOD datasets, such as DBpedia and LinkedGeoData, are voluminous and process large amounts of requests from diverse applications. Many data products and services rely on full or partial local LOD replications to ensure faster querying and processing. While such replicas enhance the flexibility of information sharing and integration infrastructures, they also introduce data duplication with all the associated undesirable consequences. Given the evolving nature of the original and authoritative datasets, to ensure consistent and up-to-date replicas frequent replacements are required at a great cost. In this paper, we introduce an approach for interest-based RDF update propagation, which propagates only interesting parts of updates from the source to the target dataset. Effectively, this enables remote applications to `subscribe' to relevant datasets and consistently reflect the necessary changes locally without the need to frequently replace the entire dataset (or a relevant subset). Our approach is based on a formal definition for graph-pattern-based interest expressions that is used to filter interesting parts of updates from the source. We implement the approach in the iRap framework and perform a comprehensive evaluation based on DBpedia Live updates, to confirm the validity and value of our approach.Comment: 16 pages, Keywords: Change Propagation, Dataset Dynamics, Linked Data, Replicatio

arXiv.org e-Print Archive

Crossref

Fraunhofer-ePrints

Compression of Structured High-Throughput Sequencing Data

Author: ER Mardis
Fabien Campagne
Frederique Lisacek
H Li
H Li
James T. Robinson
Jill P. Mesirov
JK Pickrell
JR Shearstone
JT Robinson
Kevin C. Dorff
L Skrabanek
M Hsi-Yang Fritz
M Mangone
N Agrawal
N Popitsch
Nyasha Chambwe
SM Kielbasa
TD Wu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 28/11/2012
Field of study

Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS). Most approaches currently used to store HTS data are either unable to quickly adapt to the requirements of new sequencing or analysis methods (because they do not support schema evolution), or fail to provide state of the art compression of the datasets. We have devised new approaches to store HTS data that support seamless data schema evolution and compress datasets substantially better than existing approaches. Building on these new approaches, we discuss and demonstrate how a multi-tier data organization can dramatically reduce the storage, computational and network burden of collecting, analyzing, and archiving large sequencing datasets. For instance, we show that spliced RNA-Seq alignments can be stored in less than 4% the size of a BAM file with perfect data fidelity. Compared to the previous compression state of the art, these methods reduce dataset size more than 40% when storing exome, gene expression or DNA methylation datasets. The approaches have been integrated in a comprehensive suite of software tools (http://goby.campagnelab.org) that support common analyses for a range of high-throughput sequencing assays.National Center for Research Resources (U.S.) (Grant UL1 RR024996)Leukemia & Lymphoma Society of America (Translational Research Program Grant LLS 6304-11)National Institute of Mental Health (U.S.) (R01 MH086883

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

DSpace@MIT

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

FigShare

mRNA stability and m(6)A are major determinants of subcellular mRNA localization in neurons

Author: Ameres S.
Baranovskii A.
Breimann L.
Chekulaeva M.
Cherepanov V.
Dantsuji S.
Loedige I.
Mendonsa S.
Milek M.
Popitsch N.
Zerna N.
Publication venue: Cell Press
Publication date: 03/08/2023
Field of study

For cells to perform their biological functions, they need to adopt specific shapes and form functionally distinct subcellular compartments. This is achieved in part via an asymmetric distribution of mRNAs within cells. Currently, the main model of mRNA localization involves specific sequences called "zipcodes" that direct mRNAs to their proper locations. However, while thousands of mRNAs localize within cells, only a few zipcodes have been identified, suggesting that additional mechanisms contribute to localization. Here, we assess the role of mRNA stability in localization by combining the isolation of the soma and neurites of mouse primary cortical and mESC-derived neurons, SLAM-seq, m(6)A-RIP-seq, the perturbation of mRNA destabilization mechanisms, and the analysis of multiple mRNA localization datasets. We show that depletion of mRNA destabilization elements, such as m(6)A, AU-rich elements, and suboptimal codons, functions as a mechanism that mediates the localization of mRNAs associated with housekeeping functions to neurites in several types of neurons

MDC Repository

Hijacking of transcriptional condensates by endogenous retroviruses

Author: Ameres S.
Asimi V.
Buschow R.
Cisse I.
Du M.
Fasching N.
Hetzel S.
Hnisz D.
Kretzmer H.
Mamde S.
Meierhofer D.
Meissner A.
Naderi J.
Niskanen H.
Popitsch N.
Riemenschneider C.
Sampath Kumar A.
Smith Z.
Timmermann B.
Walther M.
Weigert R.
Wittler L.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2022
Field of study

Most endogenous retroviruses (ERVs) in mammals are incapable of retrotransposition; therefore, why ERV derepression is associated with lethality during early development has been a mystery. Here, we report that rapid and selective degradation of the heterochromatin adapter protein TRIM28 triggers dissociation of transcriptional condensates from loci encoding super-enhancer (SE)-driven pluripotency genes and their association with transcribed ERV loci in murine embryonic stem cells. Knockdown of ERV RNAs or forced expression of SE-enriched transcription factors rescued condensate localization at SEs in TRIM28-degraded cells. In a biochemical reconstitution system, ERV RNA facilitated partitioning of RNA polymerase II and the Mediator coactivator into phase-separated droplets. In TRIM28 knockout mouse embryos, single-cell RNA-seq analysis revealed specific depletion of pluripotent lineages. We propose that coding and noncoding nascent RNAs, including those produced by retrotransposons, may facilitate ‘hijacking’ of transcriptional condensates in various developmental and disease contexts

MPG.PuRe

Analysis of exome data for 4293 trios suggests GPI-anchor biogenesis defects are a rare cause of developmental disorders.

Over 150 different proteins attach to the plasma membrane using glycosylphosphatidylinositol (GPI) anchors. Mutations in 18 genes that encode components of GPI-anchor biogenesis result in a phenotypic spectrum that includes learning disability, epilepsy, microcephaly, congenital malformations and mild dysmorphic features. To determine the incidence of GPI-anchor defects, we analysed the exome data from 4293 parent-child trios recruited to the Deciphering Developmental Disorders (DDD) study. All probands recruited had a neurodevelopmental disorder. We searched for variants in 31 genes linked to GPI-anchor biogenesis and detected rare biallelic variants in PGAP3, PIGN, PIGT (n=2), PIGO and PIGL, providing a likely diagnosis for six families. In five families, the variants were in a compound heterozygous configuration while in a consanguineous Afghani kindred, a homozygous c.709G>C; p.(E237Q) variant in PIGT was identified within 10-12 Mb of autozygosity. Validation and segregation analysis was performed using Sanger sequencing. Across the six families, five siblings were available for testing and in all cases variants co-segregated consistent with them being causative. In four families, abnormal alkaline phosphatase results were observed in the direction expected. FACS analysis of knockout HEK293 cells that had been transfected with wild-type or mutant cDNA constructs demonstrated that the variants in PIGN, PIGT and PIGO all led to reduced activity. Splicing assays, performed using leucocyte RNA, showed that a c.336-2A>G variant in PIGL resulted in exon skipping and p.D113fs*2. Our results strengthen recently reported disease associations, suggest that defective GPI-anchor biogenesis may explain ~0.15% of individuals with developmental disorders and highlight the benefits of data sharing

Southampton (e-Prints Soton)

Crossref

Oxford University Research Archive

St George's Online Research Archive

Factors influencing success of clinical genome sequencing across a broad spectrum of disorders

Author: Allan C
Aricescu AR
Attar M
Babbs C
Becq J
Beeson D
Bell JI
Bentley D
Bento C
Bignell P
Blair E
Broxholme J
Buck D
Buckle VJ
Bull K
Cais O
Cario H
Cazier J-B
Chapel H
Copley RR
Cornall R
Craft J
Dahan K
Davenport EE
Dendrou C
Devuyst O
Donnelly P
Fenwick AL
Fiddy S
Flint J
Fugger L
Gilbert RD
Goriely A
Green A
Greger IH
Grocock R
Gruszczyk AV
Hastings R
Hatton E
Higgs D
Hill A
Holmes C
Howard M
Hughes L
Humburg P
Humphray S
Johnson D
Kanapin A
Karpe F
Kingsbury Z
Kini U
Knight JC
Krohn J
Lamble S
Langman C
Lise S
Lonie L
Luck J
Lunter G
Martin HC
McCarthy D
McGowan SJ
McMullin MF
McVean G
Miller KA
Murray L
Nemeth AH
Nesbit MA
Nutt D
Ormondroyd E
Oturai AB
Pagnamenta A
Patel SY
Percy M
Petousi N
Piazza P
Piret SE
Polanco-Echeverry G
Popitsch N
Powrie F
Pugh C
Quek L
Ratcliffe PJ
Rimmer A
Robbins PA
Robson K
Russo A
Sahgal N
Schuh A
Silverman E
Simmons A
Sorensen PS
Sweeney E
Taylor J
Taylor JC
Thakker RV
Tomlinson I
Trebes A
Twigg SRF
Uhlig HH
Van Schouwenburg PA
Vyas P
Vyse T
Wall SA
Watkins H
Whyte MP
Wilkie AOM
Witty L
Wright B
Yau C
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/04/2015
Field of study

To assess factors influencing the success of whole-genome sequencing for mainstream clinical diagnosis, we sequenced 217 individuals from 156 independent cases or families across a broad spectrum of disorders in whom previous screening had identified no pathogenic variants. We quantified the number of candidate variants identified using different strategies for variant calling, filtering, annotation and prioritization. We found that jointly calling variants across samples, filtering against both local and external databases, deploying multiple annotation tools and using familial transmission above biological plausibility contributed to accuracy. Overall, we identified disease-causing variants in 21% of cases, with the proportion increasing to 34% (23/68) for mendelian disorders and 57% (8/14) in family trios. We also discovered 32 potentially clinically actionable variants in 18 genes unrelated to the referral disorder, although only 4 were ultimately considered reportable. Our results demonstrate the value of genome sequencing for routine clinical diagnosis but also highlight many outstanding challenges

Oxford University Research Archive

Spiral - Imperial College Digital Repository

Factors influencing success of clinical genome sequencing across a broad spectrum of disorders

Author: Allan C
Aricescu AR
Attar M
Babbs C
Becq J
Beeson D
Bell JI
Bentley D
Bento C
Bignell P
Blair E
Broxholme J
Buck D
Buckle VJ
Bull K
Cais O
Cario H
Cazier J-B
Chapel H
Copley RR
Cornall R
Craft J
Dahan K
Davenport EE
Dendrou C
Devuyst O
Donnelly P
Fenwick AL
Fiddy S
Flint J
Fugger L
Gilbert RD
Goriely A
Green A
Greger IH
Grocock R
Gruszczyk AV
Hastings R
Hatton E
Higgs D
Hill A
Holmes C
Howard M
Hughes L
Humburg P
Humphray S
Johnson D
Kanapin A
Karpe F
Kingsbury Z
Kini U
Knight JC
Krohn J
Lamble S
Langman C
Lise S
Lonie L
Luck J
Lunter G
Martin HC
McCarthy D
McGowan SJ
McMullin MF
McVean G
Miller KA
Murray L
Nemeth AH
Nesbit MA
Nutt D
Ormondroyd E
Oturai AB
Pagnamenta A
Patel SY
Percy M
Petousi N
Piazza P
Piret SE
Polanco-Echeverry G
Popitsch N
Powrie F
Pugh C
Quek L
Ratcliffe PJ
Rimmer A
Robbins PA
Robson K
Russo A
Sahgal N
Schuh A
Silverman E
Simmons A
Sorensen PS
Sweeney E
Taylor J
Taylor JC
Thakker RV
Tomlinson I
Trebes A
Twigg SRF
Uhlig HH
Van Schouwenburg PA
Vyas P
Vyse T
Wall SA
Watkins H
Whyte MP
Wilkie AOM
Witty L
Wright B
Yau C
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/04/2015
Field of study

Spiral - Imperial College Digital Repository

Structural and non-coding variants increase the diagnostic yield of clinical whole genome sequencing for rare diseases

Author: Allroggen H
Ansorge O
Babbs C
Banka S
Baños-Piñero B
Beeson D
Ben-Ami T
Bennett DL
Bento C
Blair E
Brasch-Andersen C
Bull KR
Calpena E
Camps C
Cario H
Cilliers D
Conti V
Dacal BD
Davies EG
Dhalla F
Dong Y
Dreau H
Dunford JE
Ferla M
Giacopuzzi E
Guerrini R
Harris AL
Hartley J
Hashim M
Hashimoto A
Hollander G
Hughes JR
Javaid K
Kaisaki PJ
Kane M
Kelly D
Kelly D
Kesim Y
Kini U
Knight SJL
Kreins AY
Kvikstad EM
Lange L
Langman CB
Lester T
Lines KE
Lord SR
Lu X
Lunter G
Mansour S
Manzur A
Maroofian R
Marsden B
Mason J
McGowan SJ
Mei D
Mlcochova H
Murakami Y
Németh AH
Okoli S
Ormondroyd E
Ousager LB
Pagnamenta AT
Palace J
Patel SY
Pentony MM
Popitsch N
Pugh C
Rad A
Ragoussis V
Ramesh A
Riva SG
Roberts I
Roy N
Salminen O
Sanders E
Schilling KD
Schuh AH
Schwessinger R
Scott C
Sen A
Smith C
Stevenson M
Taylor JC
Taylor JM
Thakker RV
Twigg SRF
Uhlig HH
van Wijk R
Vavoulis DV
Vona B
Wall S
Wang J
Watkins H
Wilkie AOM
Yu J
Zak J
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2023
Field of study

BACKGROUND: Whole genome sequencing is increasingly being used for the diagnosis of patients with rare diseases. However, the diagnostic yields of many studies, particularly those conducted in a healthcare setting, are often disappointingly low, at 25–30%. This is in part because although entire genomes are sequenced, analysis is often confined to in silico gene panels or coding regions of the genome. METHODS: We undertook WGS on a cohort of 122 unrelated rare disease patients and their relatives (300 genomes) who had been pre-screened by gene panels or arrays. Patients were recruited from a broad spectrum of clinical specialties. We applied a bioinformatics pipeline that would allow comprehensive analysis of all variant types. We combined established bioinformatics tools for phenotypic and genomic analysis with our novel algorithms (SVRare, ALTSPLICE and GREEN-DB) to detect and annotate structural, splice site and non-coding variants. RESULTS: Our diagnostic yield was 43/122 cases (35%), although 47/122 cases (39%) were considered solved when considering novel candidate genes with supporting functional data into account. Structural, splice site and deep intronic variants contributed to 20/47 (43%) of our solved cases. Five genes that are novel, or were novel at the time of discovery, were identified, whilst a further three genes are putative novel disease genes with evidence of causality. We identified variants of uncertain significance in a further fourteen candidate genes. The phenotypic spectrum associated with RMND1 was expanded to include polymicrogyria. Two patients with secondary findings in FBN1 and KCNQ1 were confirmed to have previously unidentified Marfan and long QT syndromes, respectively, and were referred for further clinical interventions. Clinical diagnoses were changed in six patients and treatment adjustments made for eight individuals, which for five patients was considered life-saving. CONCLUSIONS: Genome sequencing is increasingly being considered as a first-line genetic test in routine clinical settings and can make a substantial contribution to rapidly identifying a causal aetiology for many patients, shortening their diagnostic odyssey. We have demonstrated that structural, splice site and intronic variants make a significant contribution to diagnostic yield and that comprehensive analysis of the entire genome is essential to maximise the value of clinical genome sequencing

UCL Discovery

ReliableGenome : annotation of genomic regions with high/low variant calling concordance

Author: Popitsch N
Schuh A
Taylor J
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

The increasing adoption of clinical whole-genome resequencing (WGS) demands for highly-accurate and reproducible variant calling (VC) methods. The observed discordance between state-of-the-art VC pipelines, however, indicates that the current practice still suffers from non-negligible numbers of false positive and negative SNV and INDEL calls that were shown to be enriched among discordant calls but also in genomic regions with low sequence complexity.Here, we describe our method ReliableGenome (RG) for partitioning genomes into high and low concordance regions with respect to a set of surveyed VC pipelines. Our method combines call sets derived by multiple pipelines from arbitrary numbers of datasets and interpolates expected concordance for genomic regions without data. By applying RG to 219 deep human WGS datasets, we demonstrate that VC concordance depends predominantly on genomic context rather than the actual sequencing data which manifests in high recurrence of regions that can/cannot be reliably genotyped by a single method. This enables the application of pre-computed regions to other data created with comparable sequencing technology and software. RG outperforms comparable efforts in predicting VC concordance and false positive calls in low-concordance regions which underlines its usefulness for variant filtering, annotation and prioritization. RG allows focusing resource-intensive algorithms (e.g., consensus calling methods) on the smaller, discordant share of the genome (20-30%) which might result in increased overall accuracy at reasonable costs. Our method and analysis of discordant calls may further be useful for development, benchmarking and optimization of VC algorithms and for the relative comparison of call sets between different studies/pipelines.RG was implemented in Java, source code and binaries are freely available for non-commercial use at https://github.com/popitsch/wtchg-rg/ CONTACT: [email protected] SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

Oxford University Research Archive